Critical Edition of Sanskrit Texts
نویسندگان
چکیده
A critical edition takes into account all the different known versions of the same text in order to show the differences between any two distinct versions. The construction of a critical edition is a long and, sometimes, tedious work. Some software that help the philologist in such a task have been available for a long time for the European languages. However, such software does not exist yet for the Sanskrit language because of its complex graphical characteristics that imply computationally expensive solutions to problems occurring in text comparisons. This paper describes the Sanskrit characteristics that make text comparisons different from other languages, presents computationally feasible solutions for the elaboration of the computer assisted critical edition of Sanskrit texts, and provides, as a byproduct, a distance between two versions of the edited text. Such a distance can then be used to produce different kinds of classifications between the texts.
منابع مشابه
Comparing Sanskrit Texts for Critical Editions
Traditionally Sanskrit is written without blank, sentences can make thousands of characters without any separation. A critical edition takes into account all the different known versions of the same text in order to show the differences between any two distinct versions, in term of words missing, changed or omitted. This paper describes the Sanskrit characteristics that make text comparisons di...
متن کاملComparing Sanskrit Texts for Critical Editions: The Sequences Move Problem
A critical edition takes into account various versions of the same text in order to show the differences between two distinct versions, in terms of words that have been missing, changed, omitted or displaced. Traditionally, Sanskrit is written without spaces between words, and the word order can be changed without altering the meaning of a sentence. This paper describes the characteristics whic...
متن کاملCoarse Semantic Classification of Rare Nouns Using Cross-Lingual Data and Recurrent Neural Networks
The paper presents a method for WordNet supersense tagging of Sanskrit, an ancient Indian language with a corpus grown over four millenia. The proposed method merges lexical information from Sanskrit texts with lexicographic definitions from Sanskrit-English dictionaries, and compares the performance of two machine learning methods for this task. Evaluation concentrates on Vedic, the oldest lay...
متن کاملSanskritTagger: A Stochastic Lexical and POS Tagger for Sanskrit
SanskritTagger is a stochastic tagger for unpreprocessed Sanskrit text. The tagger tokenises text with a Markov model and performs part-of-speech tagging with a Hidden Markov model. Parameters for these processes are estimated from a manually annotated corpus of currently about 1.500.000 words. The article sketches the tagging process, reports the results of tagging a few short passages of Sans...
متن کاملA New Computational Schema for Euphonic Conjunctions in Sanskrit Processing
Automated language processing is central to the drive to enable facilitated referencing of increasingly available Sanskrit E-texts. The first step towards processing Sanskrit text involves the handling of Sanskrit compound words that are an integral part of Sanskrit texts. This firstly necessitates the processing of euphonic conjunctions or sandhi-s, which are points in words or between words, ...
متن کامل